Understanding how homicide rates have changed prior to the modern era requires the help of historians and archivists. Manuel Eisner, a criminology professor at the University of Cambridge, and his colleagues published the Historical Violence Database a compilation of data on long-term trends in homicide rates, in addition to qualitative information such as the cause of death, perpetrator and victim. This database is limited to countries with relatively complete historical records on violence and crime – mainly Western Europe and the US. We will use here a version of their dataset provided by OurWorldInData project based at the Oxford University.
Starting in the second half of the nineteenth century, Western European regions have consistent police records of those accused of murder or manslaughter and annual counts of homicide victims. To go back further in time, reaching as far back as the thirteenth century, Eisner collected estimates (from historical records of coroner reports, court trials, and the police) of homicide rates made in over ninety publications by scholars.
Homicide rates – measured as the number of homicides per 100,000 individuals – up to 1990 are sourced from Eisner’s (2003) publication and the Historical Violence Database.
library(tidyverse)You should always interrogate the source of your data and ask who compiled it, on the basis of what, what is missing, how representative the data are? You can consult the OurWorldInData project as well as Eisner’s publications for initial insights.
# download the dataset
download.file("https://raw.githubusercontent.com/adivea/r-history/main/episodes/data/homicide-rates-across-western-europe.csv", destfile = "data/homicide-rates-across-western-europe.csv")
# load the data into R
Western_Europe <- read_csv("data/homicide-rates-across-western-europe.csv")How clean and analysis-ready is the dataset? Do you understand what the column names represent? What is hiding under “Entity”? What is the difference between rate and homicide number?
head(Western_Europe)# A tibble: 6 × 4
Entity Code Year `Homicide rate in Europe over long-term (per 100,000)`
<chr> <chr> <dbl> <dbl>
1 England <NA> 1300 23
2 England <NA> 1550 7
3 England <NA> 1625 6
4 England <NA> 1675 4
5 England <NA> 1725 2
6 England <NA> 1775 1
Ok, the data look good except for the column
Homicide rate in Europe over long-term (per 100,000) which
is very long and not very easy to work with.
names() function and assignment key to relabel
this column to homicides_per_100k# YOUR CODEnames(Western_Europe)[4] <- "homicides_per_100k"Now, that you have looked at what the data looks like and what it represents, and streamlined it, let’s see what big picture it contains.
ggplot() function and remember the+ at
the end of the linegeom_......() for geometry (hint:
points are not great here)Year on the x axis and
homicides_per_100k column in y axiscolor.eval flag so that the code chunk
renders when knittedggplot(data = Western_Europe) +
#....YOUR CODE GOES HEREggplot(data = Western_Europe) +
geom_line(mapping = aes(x = Year,
y = homicides_per_100k,
color = Entity)) +
labs(x = "Year",
y = "Number of Homicides per 100,000 people",
title = "Homicide rate in Europe from 1300-2000")Alright, the homicide rates should all be descending over time. What a comfort. But the viz is not super clear. Let’s check the rates for individual countries.
You can visualize each country’s trend in a separate plot by adding
an extra argument to the ggplot, the facet_wrap() and
feeding it the country column. If in doubt, check your ggplot tutorial
and your country column name for exact usage.
facet_wrap() after the specification of geometry
to split countries in separate chartsfacet_wrap() arguments in R or
online.ggplot(data = Western_Europe) +
#... YOUR CODEggplot(data = Western_Europe) +
geom_line(mapping = aes(x = Year,
y = homicides_per_100k,
color = Entity)) +
facet_wrap( ~ Entity, ncol = 2) +
labs(x = "Year",
y = "Number of Homicides per 100,000 people",
title = "Homicide rate in Europe from 1300-2000") theme(), and for the
latter, try googling. Knowing how to ask a question to zoom down on the
problem is a skill that requires practice.ggplot(data = Western_Europe) +
geom_line(mapping = aes(x = Year,
y = homicides_per_100k,
color = Entity)) +
facet_wrap( ~ Entity, ncol = 2) +
labs(x = "Year",
y = "Number of Homicides per 100,000 people",
title = "Homicide rate in Europe from 1300-2000",
color = "Country") +
theme(legend.position = "bottom")For this task, download the rmarkdown script that generated this
lesson. Its extension is .Rmd and it is a flexible type of document that
allows you to seamlessly combine executable R code, and its output, with
text in a single document. It can look neat and be useful for presenting
one’s research as well as creating assignments for students. If you want
to learn more about the format, consult episode 06 among the training
guides. Once you have the original script, start in the top section, the
yaml header and then move down.
floating table of contents by
turning two of the arguments to true,chunk-names and edit flags in your
R chunks, andtimestamp
to show when the document was last updated. (Hint: you will need to add
a date to the yaml header, and them google “timestamp in rmarkdown” to
figure out the format of the query. Look for answers from
stackoverflow.com )Finally, enjoy your accomplishments and ponder the main question behind this data: are we more civilized today?
Compare the trends in homicide with the pattern of reign duration among Danish rulers through time. How would you characterize the relationship between the two timeseries?
Well done!